A Probabilistic Geocoding System based on a National Address File

نویسندگان

  • Peter Christen
  • Alan Willmore
چکیده

It is estimated that between 80% and 90% of governmental and business data collections contain address information. Geocoding – the process of assigning geographic coordinates to addresses – is becoming increasingly important in many application areas that involve the analysis and mining of such data. In many cases, address records are captured and/or stored in a free-form or inconsistent manner. This fact complicates the task of robustly matching such addresses to spatiallyannotated reference data. In this paper we describe a geocoding system that is based on a comprehensive high-quality geocoded national address database. It uses a learning address parser based on hidden Markov models to separate free-form addresses into components, and a rule-based matching engine to determine the best set of candidate matches to a reference file. The geocoding software modules are implemented (as part of the Febrl open source data linkage system) in the object-oriented language Python, which allows rapid prototype development and testing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing a single-stage geocoding method to a multi-stage geocoding method: how much and where do they disagree?

BACKGROUND Geocoding methods vary among spatial epidemiology studies. Errors in the geocoding process and differential match rates may reduce study validity. We compared two geocoding methods using 8,157 Washington State addresses. The multi-stage geocoding method implemented by the state health department used a sequence of local and national reference files. The single-stage method used a sin...

متن کامل

Assessing quality improvement initiatives when expert judgements are uncertain

A new approach for examining quality improvement initiatives regarding errors in the U.S. Census Bureau’s Master Address File (MAF) and the Topologically Integrated Geographic and Referencing System (TIGER) databases is presented. A stochastic multi-criteria decision-making method involving Bayesian weighted hierarchical multinomial logit models is used to conduct inference on the priorities in...

متن کامل

A comparison of address point, parcel and street geocoding techniques

The widespread availability of powerful geocoding tools in commercial GIS software and the interest in spatial analysis at the individual level have made address geocoding a widely employed technique in many different fields. The most commonly used approach to geocoding employs a street network data model, in which addresses are placed along a street segment based on a linear interpolation of t...

متن کامل

Using an Optimized Chinese Address Matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China

With the coming era of big data and the rapid development and widespread applications of Geographical Information Systems (GISs), geocoding technology is playing an increasingly important role in bridging the gap between non-spatial data resources and spatial data in various fields. However, Chinese geocoding faces great challenges because of the complexity of the address string format in Chine...

متن کامل

Accuracy of two geocoding methods for geographic information system-based exposure assessment in epidemiological studies

BACKGROUND Environmental exposure assessment based on Geographic Information Systems (GIS) and study participants' residential proximity to environmental exposure sources relies on the positional accuracy of subjects' residences to avoid misclassification bias. Our study compared the positional accuracy of two automatic geocoding methods to a manual reference method. METHODS We geocoded 4,247...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004